Goto

Collaborating Authors

 exponential family


Private Statistical Estimation via Truncation

Neural Information Processing Systems

We introduce a novel framework for differentially private (DP) statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded. Traditional approaches rely on problem-specific sensitivity analysis, limiting their applicability. By leveraging techniques from truncated statistics, we develop computationally efficient DP estimators for exponential family distributions, including Gaussian mean and covariance estimation, achieving near-optimal sample complexity. Previous works on exponential families only consider bounded or one-dimensional families. Our approach mitigates sensitivity through truncation while carefully correcting for the introduced bias using maximum likelihood estimation and DP stochastic gradient descent. Along the way, we establish improved uniform convergence guarantees for the log-likelihood function of exponential families, which may be of independent interest. Our results provide a general blueprint for DP algorithm design via truncated statistics.


The Quotient Bayesian Learning Rule

Neural Information Processing Systems

This paper introduces the Quotient Bayesian Learning Rule, an extension of natural-gradient Bayesian updates to probability models that fall outside the exponential family. Building on the observation that many heavy-tailed and otherwise non-exponential distributions arise as marginals of minimal exponential families, we prove that such marginals inherit a unique Fisher-Rao information geometry via the quotient-manifold construction. Exploiting this geometry, we derive the Quotient Natural Gradient algorithm, which takes steepest-descent steps in the well-structured covering space, thereby guaranteeing parameterization-invariant optimization in the target space. Empirical results on the Student-t distribution confirm that our method converges more rapidly and attains higher-quality solutions than previous variants of the Bayesian Learning Rule.


The Quotient Bayesian Learning Rule

Neural Information Processing Systems

This paper introduces the Quotient Bayesian Learning Rule, an extension of natural-gradient Bayesian updates to probability models that fall outside the exponential family. Building on the observation that many heavy-tailed and otherwise non-exponential distributions arise as marginals of minimal exponential families, we prove that such marginals inherit a unique Fisher-Rao information geometry via the quotient-manifold construction. Exploiting this geometry, we derive the Quotient Natural Gradient algorithm, which takes steepest-descent steps in the well-structured covering space, thereby guaranteeing parameterization-invariant optimization in the target space. Empirical results on the Student-$t$ distribution confirm that our method converges more rapidly and attains higher-quality solutions than previous variants of the Bayesian Learning Rule.


Finite Sample Bounds for Learning with Score Matching

arXiv.org Machine Learning

Learning of continuous exponential family distributions with unbounded support remains an important area of research for both theory and applications in high-dimensional statistics. In recent years, score matching has become a widely used method for learning exponential families with continuous variables due to its computational ease when compared against maximum likelihood estimation. However, theoretical understanding of the statistical properties of score matching is still lacking. In this work, we provide a non-asymptotic sample complexity analysis for learning the structure of exponential families of polynomials with score matching. The derived sample bounds show a polynomial dependence on the model dimension. These bounds are the first of its kind, as all prior work has shown only asymptotic bounds on the sample complexity.




Appendix AToy example

Neural Information Processing Systems

In this section, we provide and expand upon a toy example. Recall that the inputs x and x0 need not correspond to real users but could instead represent hypothetical users. Example 5. Suppose that the regulatory guideline requires that users in the same geographical location receive similar weather forecasts. This can be written as "the weather forecasts that are selected by F should be similar for all users in the same geographical location", and S could be a randomly generated set of user pairs, where each pair corresponds to two (hypothetical) users in the same geographical location, and S could contain pairs across many locations. In the left-most panel, a filtering algorithm F takes in counterfactual inputs x and x0 and produces the content Z and Z0. Because a counterfactual regulation requires that F behave similarly under x and x0, the regulation is effectively requiring that content Z and Z0 are sufficiently similar (or, graphically, that they are close in Z). The question of how to quantify "similarity" is addressed in Section 2.1. The toy example in Example 5 is illustrated in the right-most panel.